home *** CD-ROM | disk | FTP | other *** search
- GSBUILD.DOC 04/04/94
-
- This file contains information on programs in the GSBUILD/GSSEARCH
- system.
-
-
- PROCEDURES
-
-
-
-
- PROGRAMS
-
-
- GBMK_ADR.EXE
-
- Reads binary db_name.LV4 file and translates it into ASCII data
- offsets file (default name = SIZE.TXT).
-
- GBMK_CFG then reads db_name.TMP and SIZE.TXT to make db_name.CFG.
-
- NOTE: db_name is the filename path\prefix of the data base.
-
- Generates GBMK_ADR.ERR file if a processing error is encountered.
-
-
- GBMK_ADR Command Line Format:
-
- GBMK_ADR db_name.LV4 siz_file <ENTER>
-
- db_name.LV4: See GSSEARCH.DOC.
-
- siz_file: See SIZE.TXT information.
-
- Versions:
-
- 1.0 93/08/23
-
-
- GBMK_CFG.EXE
-
- Reads temporary configuration file (db_name.TMP) and data offsets file
- (SIZE.TXT) and makes a configuration file for GSSEARCH.
-
- NOTE: db_name is the filename path\prefix of the data base.
-
- The configuration file name is taken from the first line of the
- db_name.TMP file. Copies the pointer and data file names, the
- numbers of fields, indexes, and browse formats, and the title line
- from db_name.TMP to the db_name.CFG file. Then for each index field
- copies the index names, types, and data field number and merges the
- pointer data for that field from SIZE.TXT, and outputs the combined
- line to db_name.CFG file. Then copies the data field information and
- browse format information from the db_name.TMP file to the
- db_name.CFG file.
-
- Generates GBMK_CFG.ERR file if a processing error is encountered.
-
- GBMK_CFG Command Line Format:
-
- GBMK_CFG tmp_file siz_file <ENTER>
-
- tmp_file: Temporary configuration file for the data
- base.
- Default name: db_name.TMP.
-
- siz_file: ASCII file produced by GBMK_ADR.EXE.
- Default name: SIZE.TXT.
-
- Versions:
-
- 1.5 94/04/04
- Revised to accomodate inclusion of # of decimal places in
- numeric field length.
- 1.4 93/10/21
- Added test for too many lines of data in SIZE.TXT file.
- Added removal of bad .CFG file if errors are encountered.
- 1.3 93/09/23
- Fixed left justification of index and field names.
- 1.2 93/08/15
- Revised command line message trigger and added output
- of GBMK_CFG.ERR file.
- 1.0 93/07/14
-
-
- GBMK_GBG.EXE
-
- Reads TAGGED TEXT files and db name.SCR file and generates the
- db_name.GBG file.
-
- NOTE: db_name is the filename path\prefix of the data base.
-
- Generates GBMK_GBG.ERR file if a processing error is encountered.
-
-
- GBMK_GBG Command Line Format:
-
- GBMK_GBG inf_file db_name scr_file [/a] <ENTER>
-
- inf_file: ASCII file containing the path/names of
- input tagged data files. GBMK_GBG will process
- all of the listed files, to produce a single
- db_name.GBG file.
- Default name: db_name.INF
-
- db_name: This is the path/prefix to which .GBG will be
- appended to make the db_name.GBG file.
-
- scr_file: Name of the script file for the data base.
- Default name: db_name.SCR
-
- /a Switch to append to an existing .GBG file (optional)
-
- Versions:
-
- 2.0 94/02/18
- Revised record counter.
- 1.9 94/01/18
- Added tests for disk space.
- 1.8 93/12/21
- Added skip of 0x00 stopchar if Format == 0.
- 1.7 93/10/13
- Increased size of GBG buffer to match TAG buffer.
- 1.6 93/10/13
- Modified to insert 0xFF between paragraphs of non-formatted
- field.
- 1.5 93/09/28
- Modified parsing of non-formatted field to remove extra
- spaces after CR/LF.
- 1.4 93/08/30
- Made ReadTagFile function and removed tests for reads
- after each increment of TagPos. Tag file is read when
- < 0x6000 bytes are left in buffer (1 max size rec +
- 512 field tags.)
- 1.3 93/08/19
- Made ReadTagFile function and added tests for reads
- after each increment of TagPos.
- 1.2 93/08/15
- Revised command line message trigger and added output
- of GBMK_GBG.ERR file.
- 1.1 93/07/14
-
-
- GBMK_TAG.EXE
-
- Converts dBase file to a TAGGED.TXT file.
- Deleted records (marked by a non-space in the delete-rec-marker
- byte are not processed.
- Null (empty) fields are not converted.
- Numeric fields that have all zeros or zeros with decimal point will
- not be converted if the /0 flag is set in the command line.
- Memo file data is translated as is. There is no conversion of
- extended ASCII characters.
-
- Generates GBMK_TAG.ERR file if a processing error is encountered.
-
- GBCHKTAG.EXE may be used to check a .TAG file for errors, and to
- generate a list of field names.
-
- Command line format:
-
- GBMK_TAG dbf_file tag_file [/n=numrecs] [/0] <ENTER>
-
- dbf_file: The path\name of the input DBF file.
-
- tag_file: The path\name of the output tagged text file.
-
- numrecs: Number of records to convert, default = all records.
-
- option: /0 Ignore null (0) numeric fields.
-
- Versions:
-
- 2.3 94/03/23
- Added command line input of number of records to convert.
- 2.2 94/02/18
- Revised record counter.
- 2.1 94/01/18
- Revised error messages.
- 2.0 93/08/15
- Fixed bugs in text/binary reads.
- Added skip of deleted record.
- Added seek to end of header.
- 1.0 93/07/14
-
-
- GB_INDEX.EXE
-
- Reads db_name.SCR file and spawns the programs that build the data base.
-
- NOTE: db_name is the filename path\prefix of the data base.
-
- GB_INDEX command line format:
-
- GB_INDEX inf_file db_name scr_file <ENTER>
-
- inf_file: ASCII file containing the path/names of
- input tagged data files. GBMK_GBG will process
- all of the listed files, to produce a single
- db_name.GBG file.
- Default name: db_name.INF
-
- db_name: This is the path/prefix to which .GBG will be
- appended to make the db_name.GBG file.
-
- scr_file: Name of the script file for the data base.
- Default name: db_name.SCR
-
- Versions:
-
- 2.1 94/03/08
- Added del SORTED.TXT, del MERGED.TXT.
- 2.0 93/12/15
- Modified to use GBSORTC, GBSORTN, etc.
- Added check for existence of necessary programs.
- 1.0 93/11/01
-
-
- GBXTRACT.EXE
-
- Reads db_name.GBG file, db_name.SCR file and STOPWORD.LST file.
- Extracts field(s) and generates XTRACTED.TXT file. Extracted search
- terms are parsed according to the stop characters specified in the
- .SCR file, and are compared to the stop words in STOPWORD.LST if
- the stop word flag is set. For non-numeric fields, spaces at the
- beginning of each extracted term are skipped.
-
- Normally data fields are indexed one at a time. However, up to
- 25 fields may be extracted at one time, to provide global indexing
- capability. Globally indexed fields must all be of the same type,
- with no mixing of numeric and character-based fields.
-
- NOTE: db_name is the filename path\prefix of the data base.
-
- Generates GBXTRACT.ERR file if a processing error is encountered.
-
-
- GBXTRACT command line format:
-
- GBXTRACT gbg_name script_name indexnum [indexnum ...]<ENTER>
-
- gbg_name: Name of the data file for the data base.
- Default name: db_name.GBG
-
- scr_name: Name of the script file for the data base.
- Default name: db_name.SCR
-
- indexnum: number of data field to extract (integer)
- A max of 25 field numbers may be entered.
-
- Versions:
-
- 1.9 93/02/17
- Revised to only update counter every 10 records.
- 1.8 93/01/31
- Revised to skip spaces at beginning of extracted character term.
- 1.7 93/01/18
- Added tests for available disk space.
- 1.6 93/11/03
- Changed buffer sizes to allow records up to 32k bytes.
- 1.5 93/10/19
- Changed CR/LF stopchar in non-formatted field to 255 (0xFF).
- 1.4 93/10/04
- Added goto nextfield if non-numeric found in field type N.
- Added data field type n for multiple-entry numeric fields.
- 1.3 93/10/01
- Changed name of Fields structure to DataFields to avoid
- type clash with Fields structure in GSSEARCH.H
- Forced insertion of CR/LF or 0 into stop character list.
- 1.2 93/08/15
- Revised command line message trigger and added output
- of GBXTRACT.ERR file.
- Fixed bug in fread trigger.
- 1.1 93/07/14
-
-
- GBSORTC.EXE 01/06/93
- GBSORTN.EXE 01/06/93
-
- Reads the XTRACTED.TXT file and sorts it, generating SORTED.TXT.
- GBSORTC sorts character field output, GBSORTN sorts numeric field
- output. NOTE: Renamed from SORT.EXE, NSORT.EXE.
-
-
- GBMERGEC.EXE 01/06/93
- GBMERGEN.EXE 01/06/93
-
- Reads SORTED.TXT file and merges duplicate indexes, generating
- MERGED.TXT. GBMERGEC merges character field output, GBMERGEN merges
- numeric field output. NOTE: Renamed from MERGE.EXE, NMERGE.EXE.
-
-
- GBCOUNTC.EXE 01/06/93
- GBCOUNTN.EXE 02/16/94
-
- Reads MERGED.TXT file and counts indexes terms, and concatenates
- index address information into db_name.LV1 -- db_name.LV4 files.
- GBCOUNTC works on character field output, GBCOUNTN works on numeric
- field output. NOTE: Renamed from COUNT.EXE, NCOUNT.EXE.
-
-
- COLOR.EXE
-
- Changes color of menu screen.
-
- Command line format:
-
- COLOR colorstring <ENTER>
-
- where colorstring consists of a 2 or 3 character color defined as:
-
- intensity (Optional)
- - = low intensity (default)
- + = high intensity
-
- colors: R = Red
- G = Green
- B = Blue
- Y = Yellow
- M = Magenta
- C = Cyan
- W = White
- K = blacK
-
- EXAMPLE: COLOR -WB <ENTER>
-
-
- GSMENU.EXE, GSMENU.DAT
-
- See GSMENU.DOC.
-
-
- GSBUILD.MNU
-
- Menu data file for GSMENU.EXE. This calls the programs to build
- the data base.
-
-
- PREPB.EXE, PREPC.EXE
-
- Programs which allow interactive generation of the data base control
- files db_name.TMP and db_name.SCR.
-
-
- START.BAT
-
- Batch file which starts the GSMENU.EXE program using the GSBUILD.MNU
- data file.
-
-
- BUILD.BAT
-
- Batch file which starts the build process. This calls INDEX.BAT, and
- provides file control information as part of the command line format:
-
- db_path\INDEX.BAT INF_file db_path\db_name SCR_file
-
-
- INDEX.BAT
-
- Batch file which controls the building of the data base. Generated by
- PREPB.EXE. At each step, if a .ERR file is found the processing will
- stop.
-
- GBCHKTAG.EXE
-
- Checks a .TAG file for errors. Tests for extra curly brackets,
- control characters, tag names or records which are too long.
- Prints a list of field names, and errors found with line and byte
- number of occurrence.
-
- Command line format:
-
- GBCHKTAG tag_file output_file <ENTER>
-
- tag_file: The path\name of the output tagged text file.
-
- dbf_file: The path\name of the output data file.
-
- Versions:
- 2.3 04/08/94
- Added record and byte counters.
- Renamed from CHK_TAGS.
- 2.2 12/20/93
- Added test for unmatched curly brackets.
- Removed test for tildes.
- Moved reread section to get rid of bug giving erroneous
- error messages for "non-consecutive" CR and LFs.
- 2.1 11/03/93
- Revised record test length to 32768 bytes.
- 2.0 10/06/93
- Added test for record length.
- 1.0 03/12/93
-
-
- GSSEARCH.EXE, GSSEARCH.HLP
-
- See GSSEARCH.DOC
-
-
- GSMK_HLP.EXE
-
- Program which allows user to make a database specific help file, which
- is accessed by GSSEARCH.EXE.
- See GSSEARCH.DOC for more information.
-
- Versions:
-
- 4.0 94/01/11
- Revised to allow general data base help input as prefix.TXT.
- 3.0 93/12/07
- Revised to put help text offsets for GSSEARCH.HLP at actual
- offset instead of offset+1.
- Revised to add CTRL-Z to end of text if not already present.
- 2.0 93/08/15
- Changed to start with index field 0, and to allow
- non-sequential numbers (i.e. skipped numbers. )
- 1.0 Unknown
-
-
-
-
- FILE FORMATS & INFORMATION
-
- progname.ERR
-
- Error file generated by programs in the GSBUILD software system.
- Contains error message describing the processing fault encountered.
-
-
- db_name.CFG
-
- See GSSEARCH.DOC.
-
-
- db_name.GBG
-
- See GSSEARCH.DOC.
-
-
- db_name.HLP
-
- See GSSEARCH.DOC.
-
-
- db_name.LV1 ... db_name.LV4
-
- See GSSEARCH.DOC.
-
-
- db_name.TMP
-
- Prototype Configuration file (ASCII file) generated by PREPB.EXE
- from db_name.DBF and interactive input.
-
- Line # Contents
-
- 1 Name of configuration file to make.
- 2..7 Path/names of pointer and data files.
- 8 # data fields. (Y)
- 9 # indexed fields (X)
- 10 # browse formats specified
- 11 Title line
- 12..X+12 Indexed field data lines
- Index name (12 chars max)
- tilde character (stop read marker)
- Index type (integer)
- 1 = numeric,
- 3 = character;
- Data field number (starting with 0)
- X+12..X+12+Y Data field data lines
- Data field name (10 chars max)
- tilde character (stop read marker)
- Index type (integer)
- C = character,
- c = character (long),
- M = memo,
- N = numeric;
- Data field length. For Numeric fields
- the number of decimal places is
- indicated by an integer following a
- slash (/), e.g. 5/2.
- Y+1 Browse format #1 header line
- Y+2 Browse format #1 data line
- Y+3 Browse format #2 header line
- Y+4 Browse format #2 header line
- ...
-
- Example:
-
- HCDN.CFG
- \HCDN\HCDN.LV1~
- \HCDN\HCDN.LV2~
- \HCDN\HCDN.LV3~
- \HCDN\HCDN.GBG~
- \HCDN\HCDN.HLP~
- \HCDN\HELP.GBG~
- 10
- 6
- 2
- *** HCDN TEST 06/08/93 *** ~
- Sta_Num ~ 3 0
- Sta_Name ~ 3 1
- Region ~ 3 2
- Sub_Region ~ 3 3
- Drain_Area ~ 1 4
- State ~ 3 7
- STA_NUM ~ 8 C
- STA_NAME ~ 48 C
- REGION ~ 51 C
- SUB_REGION~ 51 C
- DRAIN_AREA~ 8/2 N
- NUM_YEARS ~ 3/0 N
- COMMENTS ~ 500 C
- STATE ~ 2 C
- LATITUDE ~ 6/4 N
- LONGITUDE ~ 7/4 N
- -1StaNum Station Name State Drn Area Lat Lon ~
- 1 1 8 2 10 25 8 36 2 5 40 10 9 52 6 10 60 7 0 0 0 0 0 0 0 0 0 0 0 0
- -2Sta_Num Station Name Area ~
- 1 0 8 2 9 40 5 52 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
-
-
- db_name.TAG TAGGED TEXT file (ASCII file)
-
- Contains data field-tagged record data. The record data is listed
- sequentially by data field. Fields must appear in the same order in
- the record as the list of field names in db_name.SCR. Only those
- fields with data should appear in the record. Each record must
- end with the {EOR} (end of record) field. Records may be arranged
- with one field per line or may be concatenated on one line. Blank
- lines between records are ignored.
-
- Field names -- MUST be delimited by left and right curly brackets;
- -- may be no more than ten characters, excluding brackets;
- -- may not contain blanks;
-
- Data may contain hard carriage returns. This allows retention of
- format for tables.
-
- Data MAY NOT CONTAIN tilde (~) characters. These are used as
- delimiting characters within the WORDNDX.TXT file.
-
- Data MAY NOT CONTAIN any control characters except carriage
- return (CR) and line feed (LF).
-
- CHK_TAGS.EXE may be used to check a TAG file for the
- existence of tilde characters, control characters, as well as
- other format errors.
-
- Example:
-
- {OFFICE}WRD
- {ROOM}SUITE 5
- {MAILCITY}Lansing
- {STATE}MI
- {WACODE}517
- {FTSNO}374-1608
- {DEPT}DOI
- {BUREAU}USGS
- {CTRLDATE}06/25/89
- {ELEV} 127.3
- {FIRST}Brian
- {MI}D.
- {LAST}Abbott
- {EOR}
- ... more records ...
-
-
- SIZE.TXT file (ASCII file)
-
- Contains pointer data for each indexed field in the data base.
- Default name: SIZE.TXT
- NOTE: SIZE.TXT is produced by GBMK_ADR.EXE.
-
- Example:
-
- 2 1000 100 0 0 0
- 2 1000 214 60 3200 0
- 2 1000 47 120 10048 0
- 2 1000 2 180 11552 0
- 2 1000 12 240 11616 0
- 2 1000 95 300 12000 0
-
-
- db_name.SCR SCRIPT DATA FILE (ASCII file)
-
- File containing information on the indexing parameters for the
- data base.
-
- For EACH data field contains:
-
- the data field name, NOTE: Data field name must be
- ended with a tilde (~) character.
- Maximum of 10 characters.
-
- data field indexing flag, 0 = do not index field
- 1 = index field
-
- retain format flag, Allows retention of format of
- tables, etc. Setting this to 1
- for numeric fields allows retention
- of leading blanks for formatting.
-
- 0 = wrap data;
-
- Leading spaces are eliminated.
-
- CR/LF characters are converted
- to spaces except when this would
- result in two consecutive spaces.
- Consecutive CR/LF characters are
- converted to a single space.
-
- 1 = retain format;
-
- CR/LF characters are converted
- to 00.
-
- stop character string, List of characters to use as
- delimiters between words for
- indexing.
-
- Ignored if field is not indexed.
- Stop characters should NOT be used
- if the field is numeric.
-
- NOTE: Stop character list strings
- must be ended with a tilde (~)
- character.
-
- S for the stop character string
- denotes the standard stop character
- set for character fields:
-
- space
- ; semicolon
- . period
- : colon
- , comma
- ' apostrophe
- ? question mark
- - dash (minus sign)
- () parentheses
- * asterisk
- _ underscore
- ^ caret
- CR carriage return
- LF line feed
- " double quote
-
- N for the stop character string
- denotes the standard stop character
- set for NUMERIC fields:
-
- space
- CR carriage return
- LF line feed
-
- D for the stop character string
- denotes the standard stop character
- set for DATE fields:
-
- space
- CR carriage return
- LF line feed
-
- NOTE: Processing may be speeded
- considerably if the most likely
- stop characters (i.e. space, comma,
- period, etc.) are placed early in
- the list.
-
- stop words use flag, 0 = index all words,
- 1 = DO NOT INDEX words found in
- STOPWORD.LST file;
-
- Ignored if field is not indexed.
-
- See documentation on STOPWORD.LST
- file.
-
- data field type C = Character, short
- c = Character, long
- D = Date (mm/dd/yy)
- M = Memo
- N = Numeric, first word of field
- is the number. Subsequent words
- in the field are ignored.
- n = Numeric, with multiple numbers
- in the same data field.
- Non-numeric words are ignored.
- This type is useful for indexing
- numeric tables.
-
- NOTE: Commas will be removed from
- N and n numbers.
-
- Example:
-
- field index format stop stopword field
- name flag flag characters flag type
-
-
- OFFICE~ 1 0 S~ 0 C
- ROOM~ 1 0 S~ 0 C
- MAILSTOP~ 1 0 !@# %-_~ 0 C
- MAILCITY~ 1 0 S~ 0 C
- STATE~ 1 0 S~ 0 C
- WACODE~ 1 0 S~ 0 C
- FTSNO~ 1 0 S~ 0 C
- WEXCH~ 1 0 S~ 0 C
- WEXT~ 1 0 S~ 0 C
- DEPT~ 1 0 S~ 0 C
- BUREAU~ 1 0 S~ 0 C
- REMARKS~ 1 0 S~ 1 C
- CTRLDATE~ 1 0 D~ 0 D
- ELEV~ 1 1 N~ 0 N
- FIRST~ 1 0 S~ 0 C
- MI~ 1 0 S~ 0 C
- LAST~ 1 0 S~ 0 C
- EOR~ 0 1 S~ 0 C
-
- NOTES: The STATE field was flagged to not use stop words because
- their use would eliminate OR, IN and ND. Eliminating their
- use in the other fields (except REMARKS) also speeds
- processing.
-
- The CTRLDATE field has a format of MM/DD/YY. It was
- flagged to use a stop character which would NOT appear
- in the field so if would not be broken on the / characters.
-
- The ELEV field is numeric. The stop character flag is
- therefore set to N, signifying the standard numeric stop
- characters (space,CR,LF). It is also formatted with
- leading spaces to get the decimal places to line up in the
- browse display. In order to retain them the field was
- flagged to retain the format.
-
-
- STOPWORD.LST (ASCII file)
-
- List of words to be excluded from indexing. Words must be arranged
- one word to a line. All words will be converted to all capitals by
- the software. Words do not need to be alphanumerically sorted.
- Words must be continuous, i.e. no spaces within words. There may be
- no blank lines between words.
-
- NOTE: The normal stop word list contains IN, OR, and ND. Therefore
- the stopwords should not be used when indexing a field which contains
- state abbreviations, since Indiana, Oregon, and North Dakota will be
- excluded.
-
- Maximum number of words = 200.
- Maximum length of word = 10 characters.
-
- Example:
-
- ABOUT
- AFTER
- ALL
- ALSO
- AMONG
- AN
- AND
- ANY
- ... more words ...
-
-
- XTRACTED.TXT (Binary file)
-
- This file is generated by GBXTRACT.EXE, and is normally deleted
- after each index extraction.
-
- Contains information for each "term" indexed for the specified field.
- The information for each index term is contained in a block of 40
- bytes, consisting of a 10 byte header and a 30 byte word area, which
- is arranged as follows:
-
- Bytes Use Type
-
- 0 .. 1 Record length in db_name.GBG file; int
- 2 .. 5 Record offset in db_name.GBG file; long
- 6 .. 7 # of word in data field; int
- 8 .. 9 Data field number; int
- 10 .. 39 Term, padded with trailing spaces char[30]
- Character fields start at byte 10.
- Numeric fields are right justified with the
- decimal point at byte 30. If there is no
- decimal point the ones digit is at byte 29.
-
-